Regroup all multiswebench hyperparameters in constants.py #369
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR creates a single source of truth for all Multi-SWE-Bench constant values and hyperparameters by introducing
benchmarks/multiswebench/constants.py.Fixes #366
Changes
1. Created
benchmarks/multiswebench/constants.pyA new module containing all constant values organized into logical categories:
DEFAULT_DATASET,DEFAULT_SPLIT,DEFAULT_LANGUAGE,DEFAULT_MODEL_NAME,DEFAULT_VERSIONDEFAULT_DOCKER_IMAGE_PREFIX,DEFAULT_BUILD_TARGET, and environment variable namesDEFAULT_RUNTIME_API_URL,DEFAULT_STARTUP_TIMEOUT, boolean defaults, and environment variable namesDEFAULT_EVAL_MODE,DEFAULT_MAX_WORKERS,DEFAULT_LOG_LEVEL,FIX_PATCH_RUN_CMD, etc.DATASET_CACHE_DIR,DEFAULT_WORKING_DIRDEFAULT_ENV_SETUP_COMMANDS2. Updated all multiswebench modules to import from constants.py
build_images.pydownload_dataset.pyeval_infer.pyrun_infer.pyscripts/data/data_change.pyscripts/eval/update_multi_swe_bench_config.py3. Added comprehensive tests
Created
tests/test_multiswebench_constants.pywith 28 tests covering:Testing
All 28 new tests pass:
Existing tests continue to pass:
Benefits
@simonrosenberg can click here to continue refining the PR